Learning Count Classifier Preferences of Malay Nouns
نویسندگان
چکیده
We develop a data set of Malay lexemes labelled with count classifiers, that are attested in raw or lemmatised corpora. A maximum entropy classifier based on simple, languageinspecific features generated from context tokens achieves about 50% F-score, or about 65% precision when a suite of binary classifiers is built to aid multi-class prediction of headword nouns. Surprisingly, numeric features are not observed to aid classification. This system represents a useful step for semisupervised lexicography across a range of languages.
منابع مشابه
Web and Corpus Methods for Malay Count Classifier Prediction
We examine the capacity of Web and corpus frequency methods to predict preferred count classifiers for nouns in Malay. The observed F-score for the Web model of 0.671 considerably outperformed corpus-based frequency and machine learning models. We expect that this is a fruitful extension for Web–as–corpus approaches to lexicons in languages other than English, but further research is required i...
متن کاملMultilingual Generation of Numeral Classifiers using a Common Ontology
In this paper, we present a solution to the problem of generating both Japanese and Korean numeral classifiers using semantic classes from an ontology. Most nouns must use a numeral classifier when they are quantified in languages such as Chinese, Japanese, Korean, Malay and Thai. In order to select an appropriate classifier, we propose an algorithm which associates classifiers with semantic cl...
متن کاملIndividuation and Quantification: Do bare nouns in Mandarin Chinese individuate?
Some have proposed that speakers of classifier languages such as Mandarin or Japanese, which lack count-mass syntax, have to rely on classifiers for acquiring individuated meanings of nouns (e.g., Borer 2005; Lucy 1992). This paper examines this view by looking at how Mandarin adults interpret bare nouns and use classifier knowledge to guide quantification in three experiments. Experiment 1 fou...
متن کاملThe Count-Mass Distinction of Abstract Nouns in Mandarin Chinese
The issue of whether nouns in Mandarin Chinese can be distinguished into count and mass nouns has been debated in recent literature. Unlike English, Mandarin Chinese is a language where nouns are not obviously count nouns or mass nouns. In fact, syntactically nouns in Mandarin are similar to mass nouns in English, as they cannot combine directly with numerals, but must combine with classifiers;...
متن کاملLearning Subjective Nouns using Extraction Pattern Bootstrapping 2003 Conference on Natural Language Learning (CoNLL-03), ACL SIGNLL
We explore the idea of creating a subjectivity classifier that uses lists of subjective nouns learned by bootstrapping algorithms. The goal of our research is to develop a system that can distinguish subjective sentences from objective sentences. First, we use two bootstrapping algorithms that exploit extraction patterns to learn sets of subjective nouns. Then we train a Naive Bayes classifier ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008